Survival analysis of localized prostate cancer with deep learning

In recent years, data-driven, deep-learning-based models have shown great promise in medical risk prediction. By utilizing the large-scale Electronic Health Record data found in the U.S. Department of Veterans Affairs, the largest integrated healthcare system in the United States, we have developed an automated, personalized risk prediction model to support the clinical decision-making process for localized prostate cancer patients. This method combines the representative power of deep learning and the analytical interpretability of parametric regression models and can implement both time-dependent and static input data. To collect a comprehensive evaluation of model performances, we calculate time-dependent C-statistics \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\text {td}}$$\end{document}Ctd over 2-, 5-, and 10-year time horizons using either a composite outcome or prostate cancer mortality as the target event. The composite outcome combines the Prostate-Specific Antigen (PSA) test, metastasis, and prostate cancer mortality. Our longitudinal model Recurrent Deep Survival Machine (RDSM) achieved \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\text {td}}$$\end{document}Ctd 0.85 (0.83), 0.80 (0.83), and 0.76 (0.81), while the cross-sectional model Deep Survival Machine (DSM) attained \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\text {td}}$$\end{document}Ctd 0.85 (0.82), 0.80 (0.82), and 0.76 (0.79) for the 2-, 5-, and 10-year composite (mortality) outcomes, respectively. In addition to estimating the survival probability, our method can quantify the uncertainty associated with the prediction. The uncertainty scores show a consistent correlation with the prediction accuracy. We find PSA and prostate cancer stage information are the most important indicators in risk prediction. Our work demonstrates the utility of the data-driven machine learning model in prostate cancer risk prediction, which can play a critical role in the clinical decision system.

The mixture weights w k are also learned jointly via the optimization procedure outlined below. Depending on whether the input data is time-dependent or not, Φ θ can be either a Recurrent Neural Network (RNN) or a Multi-Layer Perceptron (MLP), and we call them Recurrent Deep Survival Machine (RDSM) and Deep Survival Machine (DSM), respectively. In practice, we choose either Long short-term memory (LSTM) [1] or Gated recurrent unit (GRU) [2] as the concrete realization of the RNN module in the RDSM. The training proceeds by calculating the maximum likelihood estimator, which amounts to minimizing the following loss function: where the first term denotes the uncensored loss, and the second term is for the censored loss, Here α ∈ [0, 1] is a discount factor and we treat it as a hyperparameter, and w denotes the mixture weight. To mitigate the long-tail bias, we add L 2 regularization for β k during the training. The final survival probability S(t|X) is the weighted average over k distributions:

Hyper-parameter tuning & Model structure
We use 15% of the randomly selected training data as the validation set and performed hyper-parameter optimization for two deep learning models DSM and RDSM. The results reported in Fig. 3 of the main text are obtained from the test set using the models having the lowest loss on the validation set. We adopt a similar strategy for optimizing two traditional machine learning models, GBM and RSF, and select model parameters with the highest C td on the validation set. Whereas for the Cox model, we perform 5-fold cross-validation on the entire training set and report the result on the test set using the parameter set having the highest average C td in the cross-validation. To maximize the performance, we perform separate hyper-parameter tuning against two outcomes, i.e., composite and PC-mortality for each model. The detailed DL model structure is as follows. For the composite outcome, we use a 2-layer LSTM model with 64 hidden units in each layer as the neural network module for RDSM, and initialize 40 parametric regression models with the LogNormal distribution, while we use a 2-layer MLP with 32 hidden units per layer for DSM, the number of parametric regression model is 90, which follows the Weibull distribution.
For the PC-mortality outcome, the only difference in RDSM is the change of learning rate from 0.03 to 0.0075. For DSM, the number of parametric regression model becomes 60, and the learning rate changes from 0.002 to 0.0005. We implement our DL models using PyTorch and use the Adam optimizer with a batch size of 2048 during training.
It is worth mentioning that we use the same parameters in the regional and subgroup analysis as the general case to ensure a fair comparison. We expect that if we perform hyper-parameter tuning in each case, we could see moderate performance increases for all models. Two traditional machine learning models, GBM and RSF are computationally expensive thus forbidding extensive hyper-parameter tuning. For example, training a RSF model on the whole training set can take more than 4 days, while RDSM and DSM only require a few minutes (60 epochs).

Subgroup Analysis Results
The stratification of different age and race subgroups can be found in Fig. 2 of the main text.   Table S11. Race Subgroup (other) analysis for the composite (left) and PC-mortality (right) outcome.